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Abstract — In this paper we propose distributed storage al- 
gorithms for large-scale wireless sensor networks. Assume a 
wireless sensor network with n nodes that have limited power, 
memory, and bandwidth. Each node is capable of both sensing 
and storing data. Such sensor nodes might disappear from the 
network due to failures or battery depletion. Hence it is desired 
to design efficient schemes to collect data from these n nodes. We 
propose two distributed storage algorithms (DSA's) that utilize 
network flooding to solve this problem. In the first algorithm, 
DSA-I, we assume that every node utilizes network flooding to 
disseminate its data throughout the network using a mixing time 
of approximately 0(n). We show that this algorithm is efficient 
in terms of the encoding and decoding operations. In the second 
algorithm, DSA-II, we assume that the total number of nodes 
is not known to every sensor; hence dissemination of the data 
does not depend on n. The encoding operations in this case take 
0(C/i 2 ), where fi is the mean degree of the network graph and 
C is a system parameter. We evaluate the performance of the 
proposed algorithms through analysis and simulation, and show 
that their performance matches the derived theoretical results. 

I. Introduction 

Wireless sensor networks consist of small devices (nodes) 
with limited CPU, bandwidth, and power. They can be de- 
ployed in isolated, tragedy, and obscured fields to monitor 
objects, detect fires, temperature, flood, and other disaster 
incidents. They can also be used in areas difficult to reach or 
where it is danger for a human being to be involved. There has 
been extensive research work on sensor networks to improve 
their services, power, and operations [10]. They have taken 
much attention recently due to their varieties of applications. 

Assume a wireless sensor network Af with n nodes thrown 
in a field to detect fires or to measure temperatures. Those 
sensors are distributed randomly and cannot maintain routing 
tables or network topology. Some nodes might disappear from 
the network due to failure or battery depletion. One needs 
to design storage strategies to collect sensed data from those 
sensors before they disappear suddenly from the network. Such 
problem and their solutions have been considered in [1], [2], 
[6], [7]. 

Distributed network storage codes such as Fountain codes 
have been used along with random walks to distribute data 
from a set of sources k to a set of storage nodes n ^> k, see [1], 
[5]. The authors in [1], [2] studied a model for distributed 
network storage algorithms for wireless sensor networks where 
k sensor nodes (sources) want to disseminate their data to 
n storage nodes with minimum computational complexity. 



Fountain codes and random walks in graphs are used to solve 
this problem, in case of the total number of sensor and storage 
nodes may or may not be known. In this paper we assume a 
model where all n nodes in AT can sense and store data. Each 
sensor has a buffer of total size M. Furthermore, every sensor 
can divide its buffer into m slots (small buffers), each of size 
c, i.e. m = [M/c\ . 

In this paper we propose a distinct model for a wireless 
sensor network, wherein all nodes serve as sensors/sources as 
well as storage/receiver nodes. The main advantages of the 
proposed algorithms are as follows: 

i) Using analysis and simulation, we show that the encoding 
operations, of a node to disseminate its data, take less 
computational time in comparison to the previous work. 

ii) One does not need to query all nodes in the network 
in order to renieve information about all n nodes. Only 
%20 — %30 of the total nodes can be queried. 

iii) One can query only one arbitrary node u in a certain 
region in the network to obtain an information about this 
region. 

II. Network Model and Assumptions 

In this section we present the network model and problem 
definition. Consider a wireless sensor network Af with n 
sensor nodes that are uniformly distributed at random in a 
region A = [0, L] 2 for some integer L > 1. The network 
model Af can be presented by a graph G = (V, E) with a set 
of nodes V and a set of edges E. The set V represents the 
sensors S = {s\, s%, . . . , s n } that will measure information 
about a specific field. Also, E represents a set of connections 
(links) between the sensors S. Two arbitrary sensors Sj and Sj 
are connected if they are in each other's ttansmission range. 

We ensure that the network is dense, meaning with high 
probability there are no isolated nodes. Let r > be a fraction. 
We say that two nodes u and v in V are connected in G if and 
only if the distance between them is bounded by the design 
parameter r, i.e. < d(u, v) < r. 

Given u, v € V, we say u and v are adjacent (or u is 
adjacent to v, and vice versa) if there exists a link between u 
and v, i.e., (u,v) £ E. In this case, we also say that u and 
v are neighbors. Denote by Af(u) the set of neighbors of a 
node u. The number of neighbors, with a direct connection, 
of a node u is called the node degree of u, and denoted by 
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Fig. 2. Every node Sj has a buffer of size M that is divided into m small 
slots. The node s; decides with a certain probability whether to accept or 
reject a data x a ^ and where to save it in one of its buffers. 



Fig. 1 . A WSN with n nodes arbitrary and randomly distributed in a field. 
A node Sj determines its degree d(si) by sending a flooding message to the 
neighboring nodes. 



d(u), i.e., |A/"(u)| = d(u). The mean degree of a graph G is 
given by 

^ = W\^ d[ul (1) 

1 1 u£G 

where |V| is the total number of nodes in G. 

The Ideal Soliton distribution fti s (d) for k source blocks is 
given by [8] 

f t, * = 1 

n is (i) =Pr(d = i) = S i (2) 
— - —r, i = 2,3,...,k. 
y i(i — 1) 

We will use this probability distribution in the algorithms 
developed in the next section. 

A. Assumptions 

We have the following assumptions about the network 
model J\f: 

i) Let S = {si,...,s„} be a set of sensing nodes that 
are distributed randomly and uniformly in a field. Each 
sensor acts as both a sensing and storage node. Thus, 
this assumption differentiate between our work and the 
problems considered in [1], [7]. 

ii) Every node does not maintain routing or geographic 
tables, and the network topology is not known. Every 
node Si can send a flooding message to the neighboring 
nodes. Also, every node Sj can detect the total number 
of neighbors by broadcasting a simple query message, 
and whoever replies to this message will be a neighbor 
of this node. Therefore, our work is more general and 
different from the work done in [3], [4]. The degree d(u) 
of this node is the total number of neighbors with a direct 
connection. 

iii) Every node has a buffer of size M and this buffer can 
be divided into smaller slots, each of size c, such that 
m = [M/ c\ . Hence, all nodes have the same number of 
slots. Also, the first slot of a node u is reserved for its 
own sensing data. 



iv) Every node s , prepares a packet packet Si with its ID Si , 
sensed data x Si , counter c{x Si ), and a flag that is set to 
zero or one. 

v) Every node draws a degree d u from a degree distribution 
Cli s . If a node decided to accept a packet, it will also 
decide on which buffer it will be stored. 

III. Distributed Storage Algorithms 

In this section we will present a networked distributed 
storage algorithm for wireless sensor networks, where all 
nodes act as sensing and storage nodes, and study its encoding 
and decoding operations. 

A. Encoding Operations 

We present a distributed storage algorithm (DSA-I) for 
wireless sensor networks. DSA-I algorithm consists of three 
main phases: Initialization, encoding/flooding, and storage 
phases. Each phase can be described as follows. 

1 ) Initialization Phase: Every node s; in S has an ID Si and 
sensed data x Si . The node s, in the initialization phase pre- 
pares a packet Si with these values. Also, the packet contains a 
hop count field, c{x Si ), and a flag indicating whether the data 
is new or an update of a previous value. Each node will have 
a different hop count value depending on the number of its 
neighbors d(si). Such that if a node Sj has a few neighbors, 
then c(x Si ) will be large. Also, a node with large number of 
neighbors will choose a small counter c(x Si ). This means that 
every node will decide its own counter. 

packet Sz = {ID Si , x 8i , c(x Si ), flag) (3) 

The node broadcasts this packet to all neighboring nodes 

2) Encoding and Flooding Phase: 

• After the flooding phase, every node u receiving the 
packet Si will check ID Si , accept the data x Si with 
probability one, and will add this data to its buffer slots 

y- 

vi=Vu®x Sl . (4) 

This is because the node u is a direct neighbor of Sj. The 
data x Si is disseminated rapidly to all neighbors of Sj. 

• The node u will decrease the counter by one as 

c(x Si ) = c(x 3 .) - 1. (5) 
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Input: A sensor network with S = {si, . . . , s„} source 
nodes, n source packets x Si , ■ ■ ■ , x Sn and a 
positive constant c(si). 
Output: storage buffers y\ , y<i , . . . , y n for all sensors S. 
foreach node u = 1 : n do 

Generate d c (u) according to Qi s (d) (or fl rs (d) and a 
set of neighbors AT(u) using flooding.; 

end 

foreach source node Sj , i — 1 : n do 

Generate header of x Si and token = 0; 

Set counter c(x Si ) = \n/d{si)\; 

Flood x Si to all Af(si) uniformly at random, Send 

x Sz tone Af(si) ; 

with probability 1, y u = y u © x Si ; 

Put x Si into it's forward queue; 

c(x Si ) = c(x Si ) - 1; 

end 

while source packets remaining do 

foreach node u receives packets before current round 
do 

Choose v 6 AT(u) uniformly at random; 
Send packet x Si in u's forward queue to v; 
if v receives x Si for the first time then 
coin = rand(l); 

flip a coin to accept or reject a packet ; 
if coin < j 7 , then 

yv — yv ffi J 

Put x Si into v's forward queue; 

c(x Si ) = c(x Si ) - 1 

end 

else if c(x Si ) > 1 then 

Put x Si into v's forward queue; 

c(x Sl ) = c(x Sl ) - 1; 

else 

Discard x Si ; 

Hence C(si) — 1 or no node to send to. 

end 

end 

end 

Algorithm 1: DSA-I Algorithm: Distributed storage algo- 
rithm for a WSN where the data is disseminated using 
multicasting and flooding to all neighbors. 



The node u will select a set of neighbors that did not 
receiver the message x Si and it will unicast this message 
to them. 

• For an arbitrary node v that receives the message from u, 
it will check if the x Si has been received before, if yes, 
then it will discard it. If not, then it will decide whether 
to accept or reject it based on a random value drawn from 
ftis (d) . If accepted, then it will add the data to one of its 
buffer slots y+ = y~ © x Si and will decrease the counter 
c(x Si ) = c(x Si ) - 1. 

• The node v will check if the counter is zero, otherwise it 
will decrease it and send this message to the neighboring 



nodes that did not receive it. 
3) Storage Phase: Every node will maintain its own buffer 
by storing a copy of its data and other nodes' data. Also, 
a node will store a list of nodes ID's of the packets that 
reached it. After all nodes receive, send, and store their own 
and neighboring data. Therefore, each node will have some 
information about itself and other nodes in the network. 

B. Decoding Operations 

The stored data can be recovered by querying a number of 
nodes from the network. Let n be the total number of alive 
nodes; assume that every node has m buffer slots such that 
m = [M/c\, where c is a small buffer size, and M is the 
total buffer size in a node . In the next section, we show that 
the data collector needs to query at least (1 + e)n/m nodes in 
order to retrieve the information about the n variables. This is 
much better than previous approaches [1], [2], [7] that require 
querying large number of sources. 

IV. DSA-I Analysis 

We shall provide analysis for the DSA-I algorithm shown in 
the previous section. The main idea is to utilize flooding and 
the node degree of each node to disseminate the sensed data 
from sensors throughout the network. We note that nodes with 
large degree will have smaller counters in their packets such 
that their packets will travel for minimal number of neighbors. 
Also, nodes with smaller degree will have larger counters such 
that their packets will be disseminated to many neighbors as 
possible. The following lemma establishes the number of hobs 
(steps) that every packet will travel in the network. 

Lemma 1: On average, with a high probability, the total 
number of steps for one packet originated by a node u in one 
branch in DSA-I is 0(n/p). 

Proof: Let u be a node originating a packet packet u with 
degree d(u). For any arbitrary node v, the packet packet u will 
be forwarded only if it is the first time to visit v or the counter 
c(x u ) > 2. We know that every packet originated from a node 
u has a counter given by 

c(x u ) = [n/d{u)\. (6) 

Let p be the mean degree of the graph representing the 
network J\f. On average, assuming every packet will be sent 
to p neighboring nodes, approximating the mean degree of the 
graph to the degree of any arbitrary node u, the result follows. 

■ 

If the total number of nodes is not known, one can use 
the method developed in [1] to estimate n. In other words, a 
random walk initiated by the node u can be run to estimate 
the total number of nodes. 

Lemma 2: Let AT be an instance model of a wireless 
sensor network with n sensor nodes. The total number of 
transmissions required to disseminate the information from 
any arbitrary node throughout the network is 0{n). 

Proof: Let d(si) be the degree of a sensor node s;. 
On average u is the mean degree of the set of sensors S 
approximated by ^(5^™ d( s i))- Every node does flooding 
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that takes O(l) running time to d(si) neighbors. In order to 
disseminate information from a sensor Sj, at least n/fi steps 
are needed using Lemma Q] Also, every sensor s, needs to 
send [i messages on average to the neighbors. Hence the result 
follows. ■ 

Note that this is much better than previous results shown 
in [1] that take nlogn, where n is the number of sources. 

Theorem 1: The encoding operations of DSA-I algorithm 
are the total number of transmissions required to disseminate 
information sensed by all nodes that is 0(n 2 ). 

V. DSA-II Algorithm Without Knowing Global 
Information 

In algorithm DSA-I we assumed that the total number of 
nodes are known in advance for each sensing/storing node in 
the network. This might not be the case since arbitrary nodes 
might join and leave the network at various times due to the 
fact that they have limited CPU and short life time. Therefore, 
one needs to design a network storage algorithm that does not 
depend on the value of the total number of nodes. 

We extend DSA-I to obtain a distributed storage algorithm 
(DSA-II) that is totally distributed without knowing global 
information. The idea is that each node u will estimate a value 
for its counter c(u), the hop count, without knowing n. In 
DSA-II each node u will first perform an inference phase that 
will calculate value of the counter c(u). This can be achieved 
using the degree of u and the degrees of the neighboring nodes 
M(u). We also assume a parameter c u that will depend on the 
network condition and node's degree. 

Inference Phase: Let u be an arbitrary node in a distributed 
network Af. In the inference phase, each node u will dynami- 
cally determine value of the counter c(u). The node u knows 
its neighbors AT(u). This is achieved in the flooding phase. 
Furthermore, the node v in M(u) knows the degrees of its 
neighbors. 

The inference phase is done dynamically in the sense that 
every node in the network will independently decide a value 
for its counter. Nodes with large degrees will have a high 
chance of forwarding their data throughout the network to a 
large number of nodes. 

Let v be a node connected to a source node u. Let b v be 
the degree of a node v without adding nodes in J\f(u) Uu. We 
can approximate the counter c(u) as 

c(u) = c u — l — V b v (7) 

Once the hop counts c(it) is approximated at each node u, 
the encoding operations of DSA-II algorithm are similar to 
encoding operations of DSA-I algorithm. 

Lemma 3: Let N be a sensor network with n sensor 
nodes uniformly distributed. The total number of transmissions 
required to disseminate the information from any arbitrary 
node throughout the network for the DSA-II is given by 

O(MM-A)), (8) 
where A be the average node density [9]. 



VI. Practical Aspects 

In this section we shall provide evaluation and comparison 
analysis between DSA-I and DSA-II algorithms and related 
work in distributed storage algorithms. Previous work focused 
on utilizing random walks and Fountain codes to disseminate 
data sensed by a set of sensors throughout the network. Also, 
global and geographical information such as knowing total 
number of nodes, routing tables, and node locations are used. 

In this work, we disseminate data throughout the network 
using data flooding once at every sensor node, then adding 
some redundancy at other neighboring nodes using random 
walks and packet trapping. Every storage node will keep track 
of other node's ID's, from which it will accept/reject packets. 

The main advantages of the proposed algorithms are as 
follows 

i) One does not need to query all nodes in the network 
in order to retrieve information about all n nodes. Only 
%20 — %30 of the total nodes can be queried. 

ii) One can query only one arbitrary node u in a certain 
region in the network to obtain an information about this 
region. 

iii) The DSA-I and DSA-II algorithms proposed in this paper 
are superior in comparison to the CDSA- and CDSA-II 
storage algorithms based on Fountain and Raptor codes 
proposed in [1], [2]. The later utilize random walks to 
disseminate the information from a set of sources to a set 
of storage nodes. 

The proposed algorithms work also in the case of data 
update. Assume a node u sensed data x u and it has been 
disseminated throughout the network using flooding as shown 
in DSA-I and DSA-II algorithms. In this case the flag value 
is set to zero; and a packet from the node u is originated as 
follows: 

packet u = (ID U , x u , c(x u ), flag) (9) 

We notice that every node v stores a copy from this data x u 
will also maintain a list of ID's including ID U . Assume x u be 
the new sensed data from the node u. The node u will send 
update message setting the flag to one. 

packet u = {ID U , x u © x u , c(x u ), flag). (10) 

The new and old data are Xored in this packet. Every storage 
node will check the flag, whether it is an update or initial 
packet. Also, the node v will check if ID U is in its own list. 
Once a node v accepts the coming update packet, it will update 
its target buffer as 

Vv = Vv ® x u ® x u- (11) 
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Fig. 3. A WSN with n nodes arbitrary and randomly distributed in a field. 
The successful decoding ratio is shown for various values of n=50, 100, 150 
with the DSA-I algorithm. 



VII. Performance and Simulation Results 

In this section we simulate the distributed storage algo- 
rithms, DSA-I, presented in SectionO]]] The main performance 
metric we investigate is the successful decoding probability 
versus the decoding ratio. We define the successful decoding 
probability p as percentage of M s successful trials for recover- 
ing all ?i variables (symbols) to the total number of trails. We 
define h to be the total number of queries needed to recover 
those n variables. Also, we can define the decoding ratio as 
the total queried nodes divided by n, i.e. h/n. 

We ran the experiment over a network with area A = [0, L] 2 
grid and with different node densities. We evaluated the 
performance with various decoding ratios depending on the 
total number of nodes inside the network with incremental 
step = 0.1. 

Fig. [5] shows the decoding performance of DSA-I algorithm 
with Ideal Soliton distribution with small number of nodes. We 
ran the experiment over a network with area A = [0, 2] 2 grid, 
and evaluated the performance with various decoding ratios 
0.1 < 77 < 1. From these results we can see that the successful 
decoding probability increases with the gradual increases of 
the decoding ratio 77 and reached it upper bound when rj => 
%30. 

Fig. 2] shows the decoding performance of DSA-I algorithm 
with Ideal Soliton distribution with large number of nodes. 
The network is deployed in A = [0, 5] 2 . From the simulation 
results we can see that the decoding ratio increases with the 
increase of A and approaches to 1 for 77 > %20. Therefore 
the proposed algorithms perform well for large-scale wireless 
sensor networks. 

VIII. Conclusion 

We presented two distributed storage algorithms for large- 
scale wireless sensor networks. Given n storage/senseing 
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Fig. 4. A WSN with n nodes arbitrary and randomly distributed in a field. 
The successful decoding ratio is shown for various values of n= 200, 400, 
600 with the DSA-I algorithm. 



nodes, we developed schemes to disseminate sensed data 
throughout the network with a lesser computational overhead. 
The algorithms' results and performance demonstrated that it 
is required to query only %20 — %30 of the network nodes 
in order to retrieve the data collected by the n sensing nodes, 
when the buffer size is %10 of the network size. Our future 
work will include practical and implementation aspects of 
these algorithms. 
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